Parallel Search Using Partitioned Inverted Files
نویسندگان
چکیده
We examine the search of partitioned inverted files with particular emphasis on issues that arise from different types of partitioning methods. Two types of index partitions are investigated: namely Termld and Docld. We describe the search operations implemented in order to support parallelism in probabilistic search. We also describe higher level features such as search topologies in parallel search methods. The results from runs on the two types of partitioning are compared and contrasted. We conclude that within our framework the Docld method is the best.
منابع مشابه
Scheduling Intersection Queries in Term Partitioned Inverted Files
This paper proposes and presents a comparison of scheduling algorithms applied to the context of load balancing the query traffic on distributed inverted files. We put emphasis on queries requiring intersection of posting lists, which is a very demanding case for the term partitioned inverted file and a case in which the document partitioned inverted file used by current search engines can perf...
متن کاملParallel methods for the update of partitioned inverted files
Purpose – An issue which tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. In this paper we study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two...
متن کاملParallel methods for the generation of partitioned inverted files
The generation of inverted indexes is one of the most computationally intensive activities for information retrieval (IR) systems: indexing large multigigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. We describe the components of PLIERS, the system used to index the documen...
متن کاملAn Effective Approach to Temporally Anchored Information Retrieval
We consider in this paper the information retrieval problem over a collection of time-evolving documents such that the search has to be carried out based on a query text and a temporal specification. A solution to this problem is critical for a number of emerging large scale applications involving archived collections of web contents, social network interactions, blog traffic, and information f...
متن کاملIndex Structures for Distributed Text Databases
The Web has became an obiquitous resource for distributed computing making it relevant to investigate new ways of providing efficient access to services available at dedicated sites. Efficiency is an ever-increasing demand which can be only satisfied with the development of parallel algorithms which are efficient in practice. This tutorial paper focuses on the design, analysis and implementatio...
متن کامل